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Abstract 

A parallel-in-time algorithm based on an augmented Lagrangian ap¬ 
proach is proposed to solve four-dimensional variational (4D-Var) data 
assimilation problems. The assimilation window is divided into multiple 
sub-intervals that allows to parallelize cost function and gradient com¬ 
putations. Solution continuity equations across interval boundaries are 
added as constraints. The augmented Lagrangian approach leads to a 
different formulation of the variational data assimilation problem than 
weakly constrained 4D-Var. A combination of serial and parallel 4D-Vars 
to increase performance is also explored. The methodology is illustrated 
on data assimilation problems with Lorenz-96 and the shallow water mod¬ 
els. 
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1 Introduction 

Predicting the behavior of complex dynamical systems, such as the atmosphere, 
requires using information from observations to decrease the uncertainty in the 
forecast. Data assimilation combines information from a numerical model, prior 
knowledge, and observations (all with associated errors) in order to obtain an 
improved estimate of the true state of the system. Data assimilation is an 
important application of data-driven application systems (DDDAS [5], or In- 
foSymbiotic systems) where measurements of the physical system are used to 
constrain simulation results. 

Two approaches to data assimilation have gained widespread popularity: 
variational and ensemble-based methods. The ensemble-based methods are 
rooted in statistical theory, whereas the variational approach is derived from 
optimal control theory. The variational approach formulates data assimilation 
as a nonlinear optimization problem constrained by a numerical model. The ini¬ 
tial conditions (as well as boundary conditions, forcing, or model parameters) 
are adjusted to minimize the discrepancy between the model trajectory and a 
set of time-distributed observations. In real-time operational settings the data 
assimilation process is performed in cycles: observations within an assimilation 
window are used to obtain an optimal trajectory, which provides the initial con¬ 
dition for the next time window, and the process is repeated in the subsequent 
cycles. The variational methodology is widely adopted by most national and 
international numerical weather forecast centers to provide the initial state for 
their forecast models. 

Performing nonlinear optimization in the 4D-Var framework is an inherently 
sequential process. Computer architectures progressively incorporate more par¬ 
allelism, while maintaining a constant processor speed. As our understanding of 
the physics improves the computer models become increasingly more complex. 
Advanced and scalable parallel algorithms to solve 4D-Var need to be devel¬ 
oped to continue to perform data assimilation in real time. This challenge has 
been partially addressed by exploring parallelism in spatial dimension. Tremolet 
and Le Dimet |30| have shown how variational data assimilation can be used 
to couple models and to perform parallelization in space for the assimilation 
process. Rantakokko m considers different data distribution strategies to per¬ 
form parallel variational data assimilation in the spatial dimension. A scalable 
approach for three dimensional variational (3D-Var) data assimilation is pre¬ 
sented in |12j and parallelism is achieved by dividing the global problem into 
multiple local 3D-Var sub-problems. Multiple copies of modified 3D-Var prob¬ 
lem, which ensures feasibility at the boundaries of the sub-domains, are solved 
across processors and the global 3D-Var minimum is obtained by collecting the 
local minima. 

An important challenge associated with computing solutions to 4D-Var prob¬ 
lem is parallelization in the temporal dimension. Fisher m attempts to address 
this challenge by the saddle point formulation that solves directly the optimality 
equations. Aguiar et al. TJ apply the augmented Lagrangian approach to con¬ 
strained inference problems in the context of graphical models. The approach 
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proposed herein uses the augmented Lagrangian framework in the context of 
4D-Var data assimilation. The most computationally expensive components of 
4D-Var, namely the cost function and gradient evaluations, are performed in a 
time-parallel manner. 

The remainder of this paper is organized as follows: Section [2] introduces 
data assimilation and the traditional 4D-Var approach. Section [3] formulates 
the 4D-Var problem in an augmented Lagrangian framework to expose the time 
parallelism in the cost function and gradient evaluations. Section [4] gives a 
detailed description of the parallel assimilation algorithm. Section [5] shows the 
numerical results with the small, chaotic Lorenz-96 model, and a relatively large 
shallow water on the sphere model. Concluding remarks and future research 
directions are discussed in Section [6] 

2 Four-dimensional variational data assimilation 

Data assimilation (DA) is the fusion of information from priors, imperfect model 
predictions, and noisy data, to obtain a consistent description of the true state 
x true a phygi ca [ system HU ED H3 EH- The best estimate that optimally 
fuses all these sources of information is called the analysis x a . 

The prior information encapsulates our current knowledge of the system. 
Usually the prior information is contained in a background estimate of the state 
x b and the corresponding background error covariance matrix B. 

The model captures our knowledge about the physical laws that govern the 
evolution of the system. The model evolves an initial state xo £ K" at the initial 
time t 0 to future states x k € at future times iW. A general model equation 
is represented as follows: 


x k = M to ^ t m (x 0 ) . (1) 

Observations are noisy snapshots of reality available at discrete time in¬ 
stances. Specifically, measurements y k € R m of the physical state x true (fW) 
are taken at times , k = 1, • • • ,N. The model state is related to observations 
by the following relation: 

y k = ft (x k ) - £k bs , k = 1, • • • ,N, (2) 

obs _ representativeness . measurement 

£ k — t k “I" e k 

The observation operator % maps the model state space onto the observation 
space. The observation error term (e k bs ) accounts for both measurement and 
representativeness errors. Measurement errors are due to imperfect sensors. The 
representativeness errors are due to the inaccuracies of the mathematical and 
numerical approximations inherent to the model. 

Variational methods solve the data assimilation problem in an optimal con¬ 
trol framework, where one finds the control variable which minimizes the mis¬ 
match between the model forecasts and the observations. Strong-constraint 
4D-Var assumes that the model ([!]) is perfect [27, 26] . The control parameters 
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are the initial conditions xo, which uniquely determine the state of the system 
at all future times via the model equation (JT|) . The background state is the 
prior best estimate of the initial conditions xjj, and has an associated initial 
background error covariance matrix B 0 . Observations at fUl have the corre¬ 
sponding observation error covariance matrices Rk, k = 1, ■ ■ ■ , N. The 4D-Var 
problem provides the estimate xjj of the true initial conditions as the solution 
of the following optimization problem 

Xg = arg min J (x 0 ) subject to 0 , (3) 

X 0 

with the following cost function: 

J (x 0 ) = i (x 0 - 4) T B- (x 0 - x£) (4) 

1 N 

( Hk ( Xk ) _ R k 1 (^ k ( Xk ) - yk) • 

Z k=l 

The first term of the sum 0 quantifies the departure of the solution x 0 from 
the background state x k at the initial time to- The second term measures the 
mismatch between the forecast trajectory (model solutions Xk) and observations 
yk at all times fUl in the assimilation window. The weighting matrices Bo 
and Rk need to be predefined, and their quality influences the accuracy of the 
resulting analysis. 

Weak constraint 4D-Var m removes the perfect model assumption by al¬ 
lowing a model error ? 7 k+i = Xk+i — Adk,k+i ( x k)- Under the assumption that 
the model errors are normally distributed, rj k € J\f(0, Qk), the weak constraint 
4D-Var solution is the unconstrained minimizer of the cost function 

J weak (x 0 ,...,x^) = J(x 0 ) + (5) 

1 JV_1 T 

2 'y "j ( x k+l — A4k,k+1 ( x k)) Qk+l ( x k+l — -Adk.k+l ( x k)) ■ 

k-0 

The control variables are the states of the system at all times in the assimilation 
window. 

In this paper we focus on the strong constraint formulation 0. The mini¬ 
mizer of 0 is computed iteratively using gradient-based numerical optimization 
methods. First-order adjoint models provide the gradient of the cost function 
[E|, while second-order adjoint models provide the Hessian-vector product (e.g., 
for Newton-type methods). The methodology for building and using various 
adjoint models for optimization, sensitivity analysis, and uncertainty quantifi¬ 
cation is discussed in IS1I1S]- Various strategies to improve the the 4D-Var data 
assimilation system are described in m- The procedure to estimate the impact 
of observation and model errors is developed in J2ZL HI] ■ A framework to perform 
derivative free variational data assimilation using the trust-region framework is 
given in [?5| . 
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The iterative solution of ([3]) is highly sequential: first, one iteration follows 
the other; next, the forward and adjoint models are run sequentially forward 
and backward in time, respectively. In order to reveal additional parallelism the 
solution to 4D-Var problem is is approached using the augmented Lagrangian 
framework. In this framework, the assimilation window is divided into multiple 
sub-intervals, and the model constraints are explicitly imposed at the bound¬ 
aries. This approach bears similarities with the Parareal approach that exploits 
time parallelism in the solution of ordinary differential equations [23 ;. 

3 4D-Var solution by the augmented Lagrangian 
approach 

The 4D-Var cost function § is minimized subject to the generic model con¬ 
straints ([l]). The model equations can also be written as 

x k+ i = A4 k ,k+i (x k ) , k = 0, 1, ■ • • ,N — 1 , (6) 

where A4 k ,k+i represents the model solution operator that propagates the state 
Xfc at tk to the state x^+i at tk+i- The minimizer of Q under the constraints 
@ is the unconstrained minimizer of the Lagrangian 

N—1 

L(x 0 ,..., xjv; A 0 ,..., Xn) = y(x 0 ) - ^ Ajf +1 • (x k+ i - AI k ,k+i (x k )) (7) 

k=0 

To expose time parallelism the assimilation window is divided into N sub¬ 
intervals, namely, 

[io>*iv] = [io4i] U ... U [tjv-ijiiv]- (8) 

The forward model and adjoint model states at the interval boundaries are 
denoted by 

x = [x 0 , • • • , xjy], A = [A 0 , • • • , Ajv], (9) 

respectively. We denote by x a the optimal solution and by A a the optimal value 
of the Lagrange multiplier in |t| . 

The augmented Lagrangian [31, Section 17.3] associated with Q and 0 
reads: 

= \ ( x o -x^) T Bq 1 (x 0 -x£) (10) 

i n 

( n ( Xk ) ~ yk) 1 R k 1 ( n ( x k) - yk) 

z k=l 

N—1 

- 5Z A k +1 ( x k+l - Mk,k +1 (x k )) 

k=0 
N—1 

+ 2 5Z ( Xk + 1 - •A^k.k+i (x k )) T P k +i (x k+ i - A4 k ,k+i (x k )) , 

Z k=0 


£• ( x , A, fi) 
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where Pk’s are error scaling matrices. This is the (regular) Lagrangian for the 
problem that minimizes the cost function 


jaugme„ted (x()) _ ;Xjv) = j ( Xq ) + 


( 11 ) 


N-l 


n 'y ' ( x k+l Afk.k+l ( x k)) Pk+1 ( x k+l kV'lkjk+l ( x k)) 


k =0 


subject to the model constraint ([6]). The constrained minimization of is 
equivalent to([3| since the additional term is zero along the constraints. Note 
that (11) is a constrained minimization problem, unlike ([5]), which is uncon¬ 
strained. 

The original 4D-Var problem in ([3| is solved in the augmented Lagrangian 
framework by performing a sequence of unconstrained minimizations 


xl f l = arg min C 


(x,A W ,m W ), £ = 0,1,.... 


( 12 ) 


If AW ss A a then xW « x a , and the solution error decreases with increasing p 
|SU Section 17.3]. 

The optimization proceeds in cycles of inner and outer iterations. Inner 
iterations solve the optimization problem (12) for particular values of p^ and 
A^l. After each solution of (12) the outer iteration i is completed by updating 


the Lagrange multiplier approximation and the penalty parameter, as follows: 


= pp w 


(13a) 


A P+ 1} = Af } - At «(xf } -M k _i,k(4i } 1 )) , k = 0,...,N. (13b) 


The penalty parameter is progressively increased by a constant p > 1 in order 
to impose the model constraints ([b]). Different other strategies to update p and 
A can be used [5] . 


3.1 Augmented Lagrangian optimization 

Figure [T] illustrates the convergence process of the augmented Lagrangian 4D- 
Var. The assimilation window is divided into multiple sub-intervals ([8|. The 
control vector x for the optimization process contains the state vector dat the 
beginning of each of the sub-intervals ([8]). The initial value of the control vector 
contains the background states at the beginning of each of the sub-interval, 
and is therefore a continuous curve. The outer iterations start with a small 
value of p that only imposes the constraints (|6| loosely. Consequently, the solu¬ 
tions during the first outer iterations show large discontinuities at the interval 
boundaries. Subsequently, p is increased and as a result the constraints are 
satisfied accurately. This results in a smooth solution curve resembling closely 
the serial 4D-Var solution. 
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4 The parallel algorithm 


Most of the computational time required by the 4D-Var solution is claimed 
by the multiple cost function and gradient evaluations. In the augmented La- 
grangian 4D-Var formulation (12)- (13) the forward and adjoint models can be 
run in parallel over sub-intervals, as explained next. 


4.1 Parallel-in-time runs of the forward model 

The value of the augmented Lagrangian cost function ([lO ) can be computed in 
parallel. Specifically, on each sub-interval [fhl, tl^ +1 l] in (8j) a forward solution 
A4k,k+i (xk) is computed starting from the initial value Xk ([9]). The forward 
model runs on each sub-interval can be carried out concurrently. The compu¬ 
tational steps are detailed in Algorithm [l] 


Algorithm 1 ParallcLCostFunction 


1 

procedure Parallel_CostFunction 


2 

Input: [x 0 ,...,x N ; Ai,...,A n ] 


3 

Output: C 


4 

C <- 0 


5 

For all 0 < k < N — 1 do in parallel 


6 

Ax k +i <- x k+ i - A4 k ,k+i ( x k) 

> Solution mismatch at 
sub-interval boundaries 

7 

Ay k +i <- U (x k +i) - yk+i 

> Solution mismatch with 
observations 

8 

end For all 


9 

for 0<k<A — ldo 

> Evaluate the cost func¬ 
tion using (TtO} 

1 '—' 

Ax k+ i + - Ay k+ i R k +! Ay k+ i 

10 

C, C + — Ax k+1 P k+1 Axk+i — A k+1 

11 

end for 

£ ^ i (x 0 - x|j ) 1 B^ 1 (x 0 - x£) 


12 


13 

end procedure 
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4.2 Parallel-in-time runs of the adjoint model 

The gradient of the augmented Lagrangian ( [To| with respect to the forward 
model state is given by: 


V Xo £ = B 0 1 (x 0 - Xq) + MjiAi - /iMj a P 0 1 (xi - M 0 ,i (x 0 )) , (14a) 


V Xk £ = HjR- 1 (H(x k )-y k )-Aj + MT k+1 A k+1 (14b) 

4/iP k _i ( x k A4 k _i jk (x k _i)) 

—M M k : k+1 p k 1 ( X k+1 - •A'h.k+i ( x k)) , k = 1,. .., N - 1, 
V Xn £ = H^R" 1 (?* (x N ) - y N ) - A^ (14c) 

+/ iP N-l ( X N — A4 n- 1,N ( x N-l)) , 

where the tangent linear operators of the observation and model solution oper¬ 
ators are 


H k = 




<9x k 


M 


dM 


k,k+l = 


k,k+l 


dx k 


respectively, and their transposes are the corresponding adjoint operators. Using 
the notation 


Ax k := x k - A4 k -i,k ( x k-i) 


Ay k := H (x k ) - y k 


the gradient (14) 


Vx 0 >C = 


Vx k U = 


V XN A — 


can be written as 

Bq 1 ( x o - x o) + M^Ai - n Mjj Pr 1 Axi , 

H k R k A y k - A k + M k)k -(-X (Ak+1 “ ^PkxiAXk+l) 
+M P k 1 Ax k , k = 1, 

H^Rn 1 AyN ~ A^ + i-i PN- i Ax n • 


The augmented Lagrangian gradient (141 can be evaluated in parallel. On 
each sub-interval [t^, fh +1 l] in Q an adjoint solution M k k+1 A k+ i is computed 
starting from the terminal value A k +i Q. The adjoint model runs on each sub¬ 
interval can be carried out concurrently. The computational steps are detailed 
in Algorithm [2j 


4.3 Initial solution guess 

The optimization needs to start with some initial guess for xl°l and A^ 0 !. The 
initial guess for the state is obtained by performing a serial forward integration 
using the background value for initial conditions, x k °^ = x k . The initial value 
for the adjoint variable could be obtained by running once the adjoint model 
once serially along the background trajectory. In our experiments we choose the 
simpler, and less expensive, initialization A k °^ = 0. 
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Algorithm 2 ParallcLGradient 

1 

procedure Parallel.Gradient 


2 

Input: [x 0) ...,x N ; Ai,...,A n ] 


3 

Output: [V Xo £,..., V Xn £] 


4 

For all 0 < k < TV — 1 do in parallel 


5 

Axk+i t— x k+ i — At k ,k+i (xk) 

> Solution mismatch at 



sub-interval boundaries 

6 

Ay k +i «- H (x k +i) - y k +i 

> Innovation vector 

7 

end For all 


8 

For all 0 < k < TV — 1 do in parallel 


9 

b k +i t /i PAxk+i — Ak+i 

> Translate and scale solu- 



tion mismatch 

10 

d k +i t— Hj R n x Ay k +i 

> Scale innovation vector 

11 

end For all 


12 

For all 0 < k < N — 1 do in parallel 

> Perform the adjoint inte- 



grations in parallel 

13 

a k «- Mj k+1 b k +i 


14 

end For all 


15 

For all 1 < k < N — 1 do in parallel 

> Compute gradient using 



(141 

16 

V Xk £ <— b k + d k — a k 


17 

end For all 


18 

V X0 £ «- Bg 1 (x 0 - x£) - a 0 


19 

V Xn £ djsr + bw 


20 

end procedure 
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4.4 Updating Lagrange multipliers 


In order to accelerate the convergence of the optimization process we replace 


the standard updates (13) with the strategy proposed in m- The new update 
process uses information from the previous two iterations instead of just one 
iteration. The process takes the following steps: 

1. Choose A 0 and set = 1. 


2. Let xid be the solution obtained by solving the optimization problem (121 
for particular values of and A^L Apply the classical update (13) to 
obtain and A^ +1 L 


3. The updated Lagrange multiplier is obtained as follows: 

t {e+1} = * (l + y/l + 4^j , 

+ (^A) (x««> - x«>) + (Ak) (x<«> - 


It is important to note that above procedure requires the values of A from two 
successive outer iterations, namely £ and £ + 1. 


5 Numerical experiments 

We study the performance of the parallel implementation of augmented La- 
grangian 4D-Var algorithm using the Lorenz-96 model with 40 variables [19] 
and a shallow water on the sphere model with ~8,000 variables [ 29] . 

5.1 Lorenz-96 model 

The Lorenz-96 model m is given by 

^ = x k _i (x k+ i - x k _ 2 ) - x k +F, A; = 1,... ,40, (15) 

dt 

with periodic boundary conditions and the forcing term F = 8 [19] . We use 
synthetic observations generated by perturbing the reference trajectory with 
normal noise with mean zero and standard deviation of 5% of average mag¬ 
nitude of the reference solution. The background uncertainty is set to 8% of 
average magnitude of the reference solution. The background and observation 
error covariance matrices are assumed to be diagonal. A vector of equidistant 
components ranging from —2 to 2 was integrated forward in time for 200 time 
steps and the final state is taken as a reference initial condition for the experi¬ 
ments. 
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5.2 Shallow water model on the sphere 

The shallow water equations have been used extensively as a simple model of the 
atmosphere since they contain the essential wave propagation mechanisms found 
in general circulation models [25] . The shallow water equations in spherical 
coordinates are: 


du 

dt 


ICOS0 


du n du 

u— -\~ v cos 9 —— 
d X 69 



i tan 9 


g dh 

a cos 9 dX 


(16a) 


dv 

dt 


1 


a cos 9 


dv „dv 

u a\ +vemS m 



(tand 


g dh 

«+-«= °> 
a d9 


(16b) 


dh If d(hu) d(hv cos9)\ 
dt a cos 9 \ dX d9 J 


(16c) 


Here / is the Coriolis parameter given by / = 2fl sin 9, where fl is the angular 
speed of the rotation of the Earth, h is the height of the homogeneous atmo¬ 
sphere, u and v are the zonal and meridional wind components, respectively, 9 
and A are the latitudinal and longitudinal directions, respectively, a is the ra¬ 
dius of the earth and g is the gravitational constant. The space discretization is 
performed using the unstaggered Turkel-Zwas scheme mm- The discretized 
spherical grid has nlon=36 nodes in longitudinal direction and nlat=72 nodes 
in the latitudinal direction. The semi-discretization in space leads to a discrete 
model of the form ([6|. In (|6j) the zonal wind, meridional wind and the height 
variables are combined into the vector x £ R” with n = 3 x nlat x nlon. We 
perform the time integration using an adaptive time-stepping algorithm. For a 
tolerance of 10 -8 the average time step size of the time-integrator is 180 seconds. 
A reference initial condition is used to generate a reference trajectory. 

Synthetic observation errors at various times fk are normally distributed 
with mean zero and a diagonal observation error covariance matrix with entries 
equal to (Rk)i,i = 1 for u and v components and (Rk)u = 10 6 for h compo¬ 
nents. The Rk values correspond to a standard deviation of 5% for u and v 
components, and 2% for h component. We construct a flow dependent back¬ 
ground error covariance matrix as described in M- The standard deviation 
of the background errors for the height component is 2% of the average magni¬ 
tude of the reference height component in the reference initial condition. The 
standard deviation of the background errors for the wind components is 15% of 
the average magnitude of the reference wind component in the reference initial 
condition. 


5.3 Experimental setup 

All numerical experiments are carried out in Matlab. The parallel imple¬ 
mentations are developed using Matlab’ s parallel toolbox. We compare the 
performance of the proposed parallel implementation with that of the standard 
4D-Var. The accuracy of numerical solutions is measured by the root mean 
square error (RMSE) with respect to a reference solution. The RMSE is given 
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by 


RMSE = 


\ 


1 

N 


N 

E 

k =1 


Nvar 


^reference I 


(17) 


where the reference solution x reference and the analysis x a are propagated for¬ 
ward in time using the full model, and the difference is measured at all times 
throughout the assimilation window. 

Computing the cost functions and gradients is the most important aspect 
of the 4D-Var algorithm and hence it is necessary that their computations are 
scalable. To evaluate the gradient and cost function we carry out numerical in¬ 
tegrations of the forward and adjoint models using MATLODE [4]. MATLODE 
is a Matlab version of FATODE, which was developed in Fortran [55]. The 
optimization is carried out using the L-BFGS-B solver [T5] implemented in the 
Poblano optimization toolbox developed at Sandia National Laboratory m 


5.4 Results with the Lorenz-96 model 

Figure [l] illustrates the convergence of augmented Lagrangian 4D-Var itera¬ 
tions. The intermediate solutions are discontinuous at sub-interval boundaries. 
The corresponding errors, with respect to the traditional 4D-Var solution, are 
shown in Figure [2j They decrease quickly to zero, showing that the augmented 
Lagrangian 4D-Var solution converges to the solution of traditional 4D-Var. 

The errors of the augmented Lagrangian and traditional 4D-Var solutions 
with respect to the reference solution are shown in Figure [3] The reference solu¬ 
tion is obtained by propagating the reference initial condition using the forward 


model in (151. The sequential 4D-Var requires 230 gradient and 574 cost func¬ 


tion evaluations, where as the parallel 4D-Var requires a total of 100 gradient and 
650 cost function evaluations for 6 observations. The Weak scalability results 
are presented in Figure [4j As the length of the assimilation window increases, 
the number of sub-intervals increases, and so does the number of observations 
(one per sub-interval). The number of cores on which the parallel algorithm is 
run increases such as to remain equal to the number of sub-intervals. The par¬ 
allel algorithm is scalable in weak sense: the total computational time increases 
very slowly with an increased problem size. The most time consuming calcu¬ 
lations are those of the cost function and gradient evaluations, which require 
running the forward and the adjoint models, respectively. The results shown 
in Figure [5] confirm the good weak scalability of the cost function and gradient 
computations. 


5.5 Results with the the shallow water model 

Figure [6] shows the weak scalability of cost function and gradient evaluations 
with the shallow water model. It can be seen that in both cases the parallel 
computational time is nearly constant with increasing problem size (number 
of sub-intervals) and a proportional increase in the number of cores (the num¬ 
ber of cores is equal to the number of sub-intervals). Figure [T] presents the 
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Figure 1: Lorenz-96 model (15) results. Convergence of the parallel augmented 
Lagrangian 4D-Var solution over the first several iterations. The final solution 
is computed with the traditional 4D-Var. Three different variables are shown 
as an illustration. 



Time 



Time 



Time 


Errors after iteration 1 - 

Errors after iteration 3 - 

Errors after iteration 5 - 


Figure 2: Lorenz-96 model (15) results. Difference between the augmented 
Lagrangian solutions at different iterations and the solution of traditional 4D- 
Var. Three different variables are shown as an illustration. 
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Figure 3: Lorenz-96 model (15) results. RMSE errors of traditional and aug¬ 
mented Lagrangian solutions with respect to the reference analysis. The two 
implementations give nearly identical results. Traditional 4D-Var approach re¬ 
quires 230 iterations, whereas the augmented Lagrangian requires 100 iterations. 



Figure 4: Lorenz-96 model ( p~5] > results. The scaling of overall computational 
times with increasing number of sub-intervals (which leads to an increase in the 
length of the assimilation window). 
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(a) Cost of one cost function evaluation (b) Cost of one cost gradient evaluation 
for increasing number of sub-intervals for increasing number of sub-intervals 


Figure 5: Lorenz-96 model (15) results. Weak scalability results for the cost 
function and gradient evaluations. 


work-precision diagrams, i.e., the evolution of solution accuracy (RMSE) with 
increasing number of iterations (increasing CPU time). The initial iterates of the 
parallel 4D-Var solution proceed rapidly, but afterwards the convergence slows 
down. At this stage the penalty parameter /j, fairly large and the optimization 
problem becomes more difficult for the LBFGS algorithm. The performance can 
be improved by replacing LBFGS with algorithms specially tailored to solve op¬ 
timization problems in the augmented Lagrangian framework [7] . An alternative 
strategy is to use a hybrid method: employ parallel 4D-Var for several itera¬ 
tions in the beginning, then continue with traditional serial 4D-Var. Here we 
perform two outer iterations with small values of /r and then use this solution as 
an initial guess for the serial 4D-Var. This strategy improves the performance 
considerably as seen in Figure [7(a)j Table [I] provides the computational times 
for parallel and serial 4D-Var algorithms. The serial 4D-Var for 9 observations 
requires 95 gradient and 200 cost function evaluations, whereas the parallel 4D- 
Var algorithm requires 350 gradient and 720 cost function evaluations. In the 
hybrid methodology, we perform 200 iterations of parallel 4D-Var and 30 itera¬ 
tions of serial 4D-Var for 9 observations. The final RMSE over the assimilation 
window for both the serial 4D-Var and hybrid methods is ~ 125. We notice 
a steady increase in speedup as the problem size (number of sub-intervals) is 
increased. It is possible to further improve the performance of the parallel al¬ 
gorithm by using second derivative information in the form of Hessian-vector 
products and employ Newton-type methods 0. 

6 Conclusions and future work 

This work presents an augmented Lagrangian framework to perform strong- 
constraint 4D-Var data assimilation in parallel. The assimilation window is 
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Observations 


Observations 


(a) Cost of one cost function evaluation (b) Cost of one gradient evaluation for in¬ 
fer increasing number of sub-intervals creasing number of sub-intervals 


Figure 6: Shallow water equations (16) results. Weak scalability results for the 
cost function and gradient evaluations. 



(a) Hybrid, parallel and serial (tradi¬ 
tional) 4D-Var 


Figure 7: Shallow water equations (16) results. Work-performance diagrams 
comparing parallel and hybrid methods with serial 4D-Var for 9 sub-intervals. 
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No. of sub-intervals 

Serial time [sec.] 

Parallel/Hybrid time [sec.] 

Speedup 

5 

8,515 

10,834 

0.786 

7 

15,745 

12,782 

1.232 

9 

22,277 

12,971 

1.756 


Table 1: Shallow water equations ( |T6| results. Performance comparison of aug¬ 
mented Lagrangian/parallel and traditional/serial 4D-Var and the correspond¬ 
ing speedups. 


split in sub-intervals; cost function and gradient evaluations, which are the 
main components of the algorithm, are performed by running the forward and 
the adjoint model in parallel across different sub-intervals. To the best of our 
knowledge this is the first work that uses an augmented Lagrangian approach to 
data assimilation, and the first one to propose a time-parallel implementation 
of strong-constraint 4D-var. 

Future work will focus on tuning the optimization procedure to improve 
performance on large scale problems, e.g., data assimilation with the weather 
research and forecast model [35]. The size of the control variable in the aug¬ 
mented Lagrangian framework increases with the number of sub-intervals and 
can become a bottleneck for the optimization. One possible strategy to overcome 
this is to perform optimization on a coarse grid and use the projected solution 
as an initial guess for the fine grid. A natural extension to our methodology is 
to use the augmented Lagrangian framework to expose space-time parallelism 
in the data assimilation problem such as to create more parallel tasks and im¬ 
prove the overall scalability. Space parallelism in a penalty formulation has been 
recently discussed in [12] . We will consider the use of optimization algorithms 
that are specifically tuned to work well in the augmented Lagrangian framework 
[7]. Next, Hessian information can be used to accelerate the convergence signif¬ 
icantly when the iterates are close to minima. In order to implement this it is 
useful to explore the construction of second order adjoint models that compute 
the Hessian-vector products in parallel. 
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